Internet Engineering Task Force (IETF) H. Birkholz 


Request for Comments: 8610 Fraunhofer SIT 
Category: Standards Track C. Vigano 
ISSN: 2070-1721 Universitaet Bremen 


C. Bormann 
Universitaet Bremen TZI 
June 2019 


Concise Data Definition Language (CDDL): A Notational Convention 
to Express Concise Binary Object Representation (CBOR) 
and JSON Data Structures 


Abstract 
This document proposes a notational convention to express Concise 
Binary Object Representation (CBOR) data structures (RFC 7049). Its 
main goal is to provide an easy and unambiguous way to express 
structures for protocol messages and data formats that use CBOR or 
JSON. 
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1. Introduction 


In this document, a notational convention to express Concise Binary 
Object Representation (CBOR) data structures [RFC7049] is defined. 


The main goal for the convention is to provide a unified notation 
that can be used when defining protocols that use CBOR. We term the 
convention "Concise Data Definition Language", or CDDL. 


The CBOR notational convention has the following goals: 


(G1) Provide an unambiguous description of the overall structure of 
a CBOR data item. 


(G2) Be flexible in expressing the multiple ways in which data can 
be represented in the CBOR data format. 


(G3) Be able to express common CBOR datatypes and structures. 


(G4) Provide a single format that is both readable and editable for 
humans and processable by a machine. 


(G5) Enable automatic checking of CBOR data items for data format 
compliance. 


(G6) Enable extraction of specific elements from CBOR data for 
further processing. 


Not an original goal per se, but a convenient side effect of the JSON 
generic data model being a subset of the CBOR generic data model, is 
the fact that CDDL can also be used for describing JSON data 
structures (see Appendix E). 


This document has the following structure: 


The syntax of CDDL is defined in Section 3. Examples of CDDL and a 
related CBOR data item ("instance"), some of which use the JSON form, 
are described in Appendix H. Section 4 discusses usage of CDDL. 
Examples are provided throughout the text to better illustrate 
concept definitions. A formal definition of CDDL using ABNF grammar 
[RFC5234] is provided in Appendix B. Finally, a _prelude_ of 
standard CDDL definitions that is automatically prepended to, and 
thus available in, every CDDL specification is listed in Appendix D. 
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1.1. Requirements Notation 
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 


"OPTIONAL" in this document are to be interpreted as described in 
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 
capitals, as shown here. 


1.2. Terminology 


New terms are introduced in _cursive_, which is rendered in plain 
text as the new term surrounded by underscores.  CDDL text in the 
running text is in "typewriter", which is rendered in plain text as 
the CDDL text in double quotes (double quotes are also used in the 
usual English sense; the reader is expected to disambiguate this by 
context). 


In this specification, the term "byte" is used in its now-customary 
sense as a synonym for "octet". 


2. The Style of Data Structure Specification 


CDDL focuses on styles of specification that are in use in the 
community employing the data model as pioneered by JSON and now 
refined in CBOR. 


There are a number of more or less atomic elements of a CBOR data 
model, such as numbers, simple values (false, true, nil), text 
strings, and byte strings; CDDL does not focus on specifying their 
structure. CDDL of course also allows adding a CBOR tag to a 

data item. 


Beyond those atomic elements, further components of a data structure 
definition language are the datatypes used for composition: arrays 
and maps in CBOR (called "arrays" and "objects" in JSON). While 
these are only two representation formats, they are used to specify 
four loosely distinguishable styles of composition: 


O A vector : an array of elements that are mostly of the same 
semantics. The set of signatures associated with a signed data 
item is a typical application of a vector. 


O A record : an array the elements of which have different, 
positionally defined semantics, as detailed in the data structure 
definition. A 2D point, specified as an array of an x coordinate 
(which comes first) and a y coordinate (coming second), is an 
example of a record, as is the pair of exponent (first) and 
mantissa (second) in a CBOR decimal fraction. 


Birkholz, et al. Standards Track [Page 5] 


RFC 8610 CDDL June 2019 


o A table : a map from a domain of map keys to a domain of map 
values, that are mostly of the same semantics. A set of language 
tags, each mapped to a text string translated to that specific 

language, is an example of a table. The key domain is usually not 

limited to a specific set by the specification but is open for the 
application, e.g., in a table mapping IP addresses to Media Access 

Control (MAC) addresses, the specification does not attempt to 

foresee all possible IP addresses. In a language such as 

JavaScript, a "Map" (as opposed to a plain "Object") would often 

be employed to achieve the generality of the key domain. 


o A struct : a map from a domain of map keys as defined by the 
specification to a domain of map values the semantics of each of 
which is bound to a specific map key. This is what many people 
have in mind when they think about JSON objects; CBOR adds the 
ability to use map keys that are not just text strings. Structs 
can be used to solve problems similar to those records are used 
for; the use of explicit map keys facilitates optionality and 
extensibility. 


Two important concepts provide the foundation for CDDL: 


1. Instead of defining all four types of composition in CDDL 
Separately, or even defining one kind for arrays (vectors and 
records) and one kind for maps (tables and structs), there is 
only one kind of composition in CDDL: the group (Section 2.1). 


2. The other important concept is that of a type . The entire CDDL 
Specification defines a type (the one defined by its first 
.rule ), which formally is the set of CBOR data items that are 
acceptable as "instances" for this specification.  CDDL 
predefines a number of basic types such as "uint" (unsigned 
integer) or "tstr" (text string), often making use of a simple 
formal notation for CBOR data items. Each value that can be 
expressed as a CBOR data item is also a type in its own right, 
e.g., "1". A type can be built as a choice of other types, 
e.g., an "int" is either a "uint" or a "nint" (negative integer). 
Finally, a type can be built as an array or a map from a group. 


The rest of this section introduces a number of basic concepts of 
CDDL, and Section 3 defines additional syntax. Appendix C gives a 
concise summary of the semantics of CDDL. 
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2.1. Groups and Composition in CDDL 


CDDL groups are lists of group entries , each of which can be a 
name/value pair or a more complex group expression (which then in 
turn stands for a sequence of name/value pairs). A CDDL group is a 
production in a grammar that matches certain sequences of name/value 
pairs but not others. The grammar is based on the concepts of 
Parsing Expression Grammars (PEGs) (see Appendix A). 


In an array context, only the value of the name/value pair is 
represented; the name is annotation only (and can be left off from 
the group specification if not needed). In a map context, the names 
become the map keys ("member keys"). 


In an array context, the actual sequence of elements in the group is 
important, as that sequence is the information that allows 
associating actual array elements with entries in the group. In a 
map context, the sequence of entries in a group is not relevant (but 
there is still a need to write down group entries in a sequence). 


An array matches a specification given as a group when the group 
matches a sequence of name/value pairs the value parts of which 
exactly match the elements of the array in order. 


A map matches a specification given as a group when the group matches 
a sequence of name/value pairs such that all of these name/valu 

pairs are present in the map and the map has no name/value pair that 
is not covered by the group. 


A simple example of using a group directly in a map definition is: 


person = { 
age: int, 
name: tstr, 
employer: tstr, 


Figure 1: Using a Group Directly in a Map 


The thr ntries of the group are written between the curly braces 
that create the map: here, "age", "name", and "employer" are the 
names that turn into the map key text strings, and "int" and "tstr" 
(text string) are the types of the map values under these keys. 
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A group by itself (without creating a map around it) can be placed in 
(round) parentheses and given a name by using it in a rule: 


pii = ( 
age: int, 
name: tstr, 
employer: tstr, 


) 


Figure 2: A Basic Group 


This separate, named group definition allows us to rephrase 
Figure 1 as: 


person = ( 
pii 


) 
Figure 3: Using a Group by Name 


Note that the (curly) braces signify the creation of a map; the 
groups themselves are neutral as to whether they will be used in a 
map or an array. 


As shown in Figure 1, the parentheses for groups are optional when 
there is some other set of brackets present. Note that they can 
Still be used, leading to this not-so-realistic, but perfectly valid, 
example: 


person = (( 
age: int, 
name: tstr, 
employer: tstr, 


)} 


Figure 4: Using a Parenthesized Group in a Map 
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Groups can be used to factor out common parts of structs, e.g., 
instead of writing specifications in copy/paste style, such as in 
Figure 5, one can factor out the common subgroup, choose a name for 
it, and write only the specific parts into the individual maps 
(Figure 6). 


person = { 
age: int, 
name: tstr, 
employer: tstr, 


} 


dog = { 
age: int, 
name: tstr, 
leash-length: float, 
} 


Figure 5: Maps with Copy/Paste 


person = { 
identity, 
employer: tstr, 


} 


dog = { 
identity, 
leash-length: float, 
} 


identity = ( 
age: int, 
name: tstr, 


Figure 6: Using a Group for Factorization 


Note that the lists inside the braces in the above definitions 
constitute (anonymous) groups, while "identity" is a named group, 
which can then be included as part of other groups (anonymous as in 
the example, or themselves named). 
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2.1.1. Usage 


Groups are the instrument used in composing data structures with 
CDDL. It is a matter of style in defining those structures whether 
to define groups (anonymously) right in their contexts or whether to 
define them in a separate rule and to reference them with their 
respective name (possibly more than once). 


With this, one is allowed to define all small parts of their data 
Structures and compose bigger protocol data units with those or to 
have only one big protocol data unit that has all definitions ad hoc 
where needed. 


2.1.2. Syntax 
The composition syntax is intended to be concise and easy to read: 
o The start and end of a group can be marked by "(" and ")". 


o Definitions of entries inside of a group are noted as follows: 
_keytype => valuetype, (read "keytype maps to valuetype"). The 
comma is actually optional (not just in the final entry), but it 
is considered good style to set it. The double arrow can be 
replaced by a colon in the common case of directly using a text 
String or integer literal as a key; see Section 3.5.1. This is 
also the common way of naming elements of an array just for 
documentation; see Section 3.4. 


A basic entry consists of a keytype and a  valuetype , both of 
which are types (Section 2.2); this entry matches any name/value pair 
the name of which is in the keytype and the value of which is in the 
valuetype. 


A group defined as a sequence of group entries matches any sequence 
of name/value pairs that is composed by concatenation in order of 


what the entries match. 


A group definition can also contain choices between groups; see 
Section 2.2.2. 
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2.2. Types 
2.2.1. Values 


Values such as numbers and strings can be used in place of a type. 
(For instance, this is a very common thing to do for a key type, 
common enough that CDDL provides additional convenience syntax 
for this.) 


The value notation is based on the C language, but does not offer all 
the syntactic variations (see Appendix B for details). The value 
notation for numbers inherits from C the distinction between integer 
values (no fractional part or exponent given -- NR1 [1506093]; 

"NR" stands for "numerical representation") and floating-point values 
(where a fractional part, an exponent, or both are present -- NR2 or 
NR3), so the type "1" does not include any floating-point numbers 
while the types "1e3" and "1.5" are both floating-point numbers and 
do not include any integer numbers. 


2.2.2. Choices 


Many places that allow a type also allow a choice between types, 
delimited by a "/" (slash). The entire choice construct can be put 
into parentheses if this is required to make the construction 
unambiguous (please s Appendix B for details of the CDDL grammar). 


Choices of values can be used to express enumerations: 


attire = "bow tie" / "necktie" / "Internet attire" 
protocol = 6 / 17 


Analogous to types, CDDL also allows choices between groups, 
delimited by a "//" (double slash). Note that the "//" operator 
binds much more weakly than the other CDDL operators, so each line 
within "delivery" in the following example is its own alternative in 
the group choice: 


address = { delivery } 


delivery = ( 

street: tstr, ? number: uint, city // 
po-box: uint, city // 

per-pickup: true ) 


city = ( 


name: tstr, zip-code: uint 


) 
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A group choice matches the union of the sets of name/value pair 
Sequences that the alternatives in the choice can. 


For both type choices and group choices, additional alternatives can 
be added to a rule later in separate rules by using "/-" and "//=", 
respectively, instead of "=": 


attire /= "swimwear" 
delivery //- ( 


lat: float, long: float, drone-type: tstr 
) 


It is not an error if a name is first used with a "/-" or "//z" 
(there is no need to "create it" with "="). 


2.2.2.1. Ranges 
Instead of naming all the values that make up a choice, CDDL allows 


building a range out of two values that are in an ordering 
relationship: a lower bound (first value) and an upper bound (second 


value). A range can be inclusive of both bounds given (denoted by 
joining two values by ".."), or it can include the lower bound and 
xclude the upper bound (denoted by instead using "..."). If the 


lower bound exceeds the upper bound, the resulting type is the empty 
set (this behavior can be desirable when generics (Section 3.10) are 
being used). 


device-address - byte 
max-byte = 255 
byte = 0..max-byte ; inclusive range 


first-non-byte - 256 
bytel = 0...first-non-byte ; bytel is equivalent to byte 


CDDL currently only allows ranges between integers (matching integer 
values) or between floating-point values (matching floating-point 
values). If both are needed in a type, a type choice between the two 
kinds of ranges can be (clumsily) used: 


int-range = 0..10 ; only integers match 
float-range = 0.0..10.0 ; only floats match 
BAD-rangel = 0..10.0 ; NOT DEFINED 
BAD-range2 = 0.0..10 ; NOT DEFINED 
numeric-range - int-range / float-range 


(See also the control operators .lt/.ge and .le/.gt in 
Section 3.8.6.) 
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Note that the dot is a valid name continuation character in CDDL, so 


min..max 
is not a range expression but a single name. When using a name as 
the left-hand side of a range operator, use spacing as in 


min max 


to separate off the range operator. 


2.2.2.2. Turning a Group into a Choice 

Some choices are built out of large numbers of values, often 
integers, each of which is best given a semantic name in the 
Specification. Instead of naming each of these integers and then 
accumulating them into a choice, CDDL allows building a choice from a 
group by prefixing it with an "8" character: 


terminal-color = &basecolors 
basecolors = ( 
black: 0, red: 1, green: 2, yellow: 3, 
blue: 4, magenta: 5, cyan: 6, white: 7, 
) 
extended-color = &( 
basecolors, 
orange: 8, pink: 9, purple: 10, brown: 11, 


) 


As with the use of groups in arrays (Section 3.4), the member names 
have only documentary value (in particular, they might be used by a 
tool when displaying integers that are taken from that choice). 


2.2.3. Representation Types 


CDDL allows the specification of a data item type by referring to the 


CBOR representation 


(specifically, 


to major types and additional 


information; 


See Section 2 of 


[RFC7049]). 


How this is used shoul 


evident from the prelud 


(Appendix D): 


a hash mark ("+") 


optional 


ld be 
Lly 


followed by a number from 0 to 7 identifying the major type, which 
then can be followed by a dot and a number specifying the additional 
information. This construction specifies the set of values that can 
be serialized in CBOR (i.e., "any"), by the given major type if one 
is given, or by the given major type with the additional information 
if both are given. Where a major type of 6 (Tag) is used, the type 
of the tagged item can be specified by appending it in parentheses. 
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2.24 


Note that although this notation is based on the CBOR serialization, 
it is about a set of values at the data model level, e.g., "$7.25" 
Specifies the set of values that can be represented as half-precision 
floats; it does not mandate that these values also do have to be 
serialized as half-precision floats: CDDL does not provide any 
language means to restrict the choice of serialization variants. 

This also enables the use of CDDL with JSON, which uses a 
fundamentally different way of serializing (some of) the same values. 


It may be necessary to make use of representation types outside the 
prelude, e.g., a specification could start by making use of an 
existing tag in a more specific way or could define a new tag not 
defined in the prelude: 


my breakfast = #6.55799 (breakfast) ; cbor-any is too general! 
breakfast = cereal / porridge 

cereal = #6.998(tstr) 

porridge = #6.999([liquid, solid]) 

liquid = milk / water 


milk = 0 
water = 1 
solid = tstr 


4. Root Type 


There is no special syntax to identify the root of a CDDL data 
structure definition: that role is simply taken by the first rule 
defined in the file. 


This is motivated by the usual top-down approach for defining data 
structures, decomposing a big data structure unit into smaller parts; 
however, except for the root type, there is no need to strictly 
follow this sequence. 


(Note that there is no way to use a group as a root -- it must be 
a type.) 


Birkholz, et al. Standards Track [Page 14] 


RFC 8610 CDDL June 2019 


3. 


Syntax 


In this section, the overall syntax of CDDL is shown, alongside some 
examples just illustrating syntax. (The definition does not attempt 
to be overly formal; refer to Appendix B for details.) 


3.1. 


General Conventions 


The basic syntax is inspired by ABNF [RFC5234], with the following: 


o 


Rules, whether they define groups or types, are defined with a 
name, followed by an equals sign "-" and the actual definition 
according to the respective syntactic rules of that definition. 


A name can consist of any of the characters from the set ("A" to 
TZ, Wat to n" "on to "TON. . " T ER en man: UL "noy starting 
with an alphabetic character (including "Q", " ", "$") and ending 
in such a character or a digit. 


* Names are case sensitive. 
* It is preferred style to start a name with a lowercase letter. 
* The hyphen is preferred over the underscore (except in a 


"bareword" (Section 3.5.1), where the semantics may actually 
require an underscore). 


* The period may be useful for larger specifications, to express 
some module structure (as in "tcp.throughput" vs. 
"udp.throughput"). 


* A number of names are predefined in the CDDL prelude, as listed 
in Appendix D. 


* Rule names (types or groups) do not appear in the actual CBOR 
encoding, but names used as "barewords" in member keys do. 


Comments are started by a ";" (semicolon) character and finish at 
the end of a line (LF or CRLF). 


Except within strings, whitespace (spaces, newlines, and comments) 
is used to separate syntactic elements for readability (and to 
Separate identifiers, range operators, or numbers that follow each 
other); it is otherwise completely optional. 


Hexadecimal numbers are preceded by "Ox" (without quotes) and are 
case insensitive. Similarly, binary numbers are preceded by "Ob". 
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o Text strings are enclosed by double quotation '"' characters. 
They follow the conventions for strings as defined in Section 7 of 
[RFC8259]. (ABNF users may want to note that there is no support 
in CDDL for the concept of case insensitivity in text strings; if 
necessary, regular expressions can be used (Section 3.8.3).) 


o Byte strings are enclosed by single quotation "'" characters and 
may be prefixed by "h" or "b64". If unprefixed, the string is 
interpreted as with a text string, except that single quotes must 
be escaped and that the resulting UTF-8 bytes are marked as a byte 
String (major type 2). If prefixed as "h" or "b64", the string is 
interpreted as a sequence of pairs of hex digits (basel6; see 
Section 8 of [RFC4648]) or a base64 (url) string (Section 4 or 
Section 5 of [RFC4648]), respectively (as with the diagnostic 
notation in Section 6 of [RFC7049]; cf. Appendix G.2); any 
whitespace present within the string (including comments) is 
ignored in the prefixed case. 


o CDDL uses UTF-8 [RFC3629] for its encoding. Processing of CDDL 
does not involve Unicode normalization processes. 


Example: 
; This is a comment 
person = { g ) 
g = ( 
"name": tstr, 
age: int, ; "age" is a bareword 
) 
3.2. Occurrence 


An optional occurrence indicator can be given in front of a group 
entry. It is either (1) one of the characters "?" (optional), "*" 
(zero or more), or "+" (one or more) or (2) of the form n*m, where n 
and m are optional unsigned integers and n is the lower limit 
(default 0) and m is the upper limit (default no limit) of 
occurrences. 


If no occurrence indicator is specified, the group entry is to occur 
exactly once (as if 1*1 were specified). A group entry with an 
occurrence indicator matches sequences of name/value pairs that are 
composed by concatenating a number of sequences that the basic group 
entry matches, where the number needs to be allowed by the occurrence 
indicator. 
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Note that CDDL, outside any directives/annotations that could 
possibly be defined, does not make any prescription as to whether 
arrays or maps use definite-length or indefinite-length encoding. 
That is, there is no correlation between leaving the size of an array 
"open" in the spec and the fact that it is then interchanged with 
definite or indefinite length. 


Please also note that CDDL can describe flexibility that the data 
model of the target representation does not have. This is rather 
obvious for JSON but is also relevant for CBOR: 


apartment = { 
kitchen: size, 
* bedroom: size, 
} 
size = float ; in m2 
The previous specification does not mean that CBOR is changed to 
allow using the key "bedroom" more than once. In other words, due to 
the restrictions imposed by the data model, the third line pretty 
much turns into: 
? bedroom: size, 


(Occurrence indicators beyond one are still useful in maps for groups 
that allow a variety of keys.) 


3.3. Predefined Names for Types 


CDDL predefines a number of names. This subsection summarizes these 
names, but please see Appendix D for the exact definitions. 


The following keywords for primitive datatypes are defined: 


"bool" Boolean value (major type 7, additional information 20 
or 21). 


"uint" An unsigned integer (major type 0). 
"nint" A negative integer (major type 1). 
"int" An unsigned integer or a negative integer. 


"float16" A number representable as a half-precision float [IEEE754] 
(major type 7, additional information 25). 


"float32" A number representable as a single-precision float 
[IEEE754] (major type 7, additional information 26). 
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"float64" A number representable as a double-precision float 
[IEEE754] (major type 7, additional information 27). 


"float" One of floatl6, float32, or float64. 
"bstr" or "bytes" A byte string (major type 2). 
"tstr" or "text" Text string (major type 3). 


(Note that there are no predefined names for arrays or maps; these 
are defined with the syntax given below.) 


In addition, a number of types are defined in the prelude that are 
associated with CBOR tags, such as "tdate", "bigint", "regexp", etc. 


3.4. Arrays 
Array definitions surround a group with square brackets. 


For each entry, an occurrence indicator as specified in Section 3.2 
is permitted. 


For example: 


unlimited-people = [* person] 
one-or-two-people = [1*2 person] 
at-least-two-people - [2* person] 


person = ( 
name: tstr, 
age: uint, 


The group "person" is defined in such a way that repeating it in the 
array each time generates alternating names and ages, so these are 
four valid values for a data item of type "unlimited-people": 


["roundlet", 1047, "psychurgy", 2204, "extrarhythmical", 2231] 
[ 
["aluminize", 212, "climograph", 4124] 
[ 


"penintime", 1513, "endocarditis", 4084, "impermeator", 1669, 
"coextension", 865] 
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3.5. Maps 


The syntax for specifying maps merits special attention, as well as a 
number of optimizations and conveniences, as it is likely to be the 
focal point of many specifications employing CDDL. While the syntax 
does not strictly distinguish struct and table usage of maps, it 
caters specifically to each of them. 


But first, let's reiterate a feature of CBOR that it has inherited 
from JSON: the key/value pairs in CBOR maps have no fixed ordering. 
(One could imagine situations where fixing the ordering may be of 
use. For example, a decoder could look for values related with 
integer keys 1, 3, and 7. If the order were fixed and the decoder 
encounters the key 4 without having encountered key 3, it could 
conclude that key 3 is not available without doing more complicated 
bookkeeping. Unfortunately, neither JSON nor CBOR supports this, so 
no attempt was made to support this in CDDL either.) 


3.5.1. Structs 


The "struct" usage of maps is similar to the way JSON objects are 
used in many JSON applications. 


A map is defined in the same way as that for defining an array (see 
Section 3.4), except for using curly braces "{}" instead of square 


brackets "[]". 


An occurrence indicator as specified in Section 3.2 is permitted for 
each group entry. 


The following is an example of a record with a structure embedded: 


Geography = [ 
city ESTI; 
gpsCoordinates : GpsCoordinates, 
] 
GpsCoordinates = { 
longitude y ante ; degrees, scaled by 10^7 
latitude : uint, ; degrees, scaled by 10^7 


} 


When encoding, the Geography record is encoded using a CBOR array 
with two members (the keys for the group entries are ignored), 
whereas the GpsCoordinates structure is encoded as a CBOR map with 
two key/value pairs. 
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Types used in a structure can be defined in separate rules or just in 
place (potentially placed inside parentheses, such as for choices). 
For example: 


located-samples = { 
sample-point: int, 
samples: [+ float], 


} 


where "located-samples" is the datatype to be used when referring to 
the struct, and "sample-point" and "samples" are the keys to be used. 
This is actually a complete example: an identifier that is followed 
by a colon can be directly used as the text string for a member key 
(we speak of a "bareword" member key), as can a double-quoted string 
or a number. (When other types -- in particular, types that contain 
more than one value -- are used as the types of keys, they are 
followed by a double arrow; see below.) 


If a text string key does not match the syntax for an identifier (or 
if the specifier just happens to prefer using double quotes), the 
text string syntax can also be used in the member key position, 
followed by a colon. The above example could therefore have been 
written with quoted strings in the member key positions. 


More generally, types specified in ways other than those listed for 
the cases described above can be used in a key-type position by 
following them with a double arrow -- in particular, the double arrow 
is necessary if a type is named by an identifier (which, when 
followed by a colon, would be interpreted as a "bareword" and turned 
into a text string). A literal text string also gives rise to a type 
(which contains a single value only -- the given string), so another 
form for this example is: 


located-samples - ( 
"sample-point" -» int, 
"samples" => [+ float], 


Birkholz, et al. Standards Track [Page 20] 


RFC 8610 CDDL June 2019 


See Section 3.5.4 below for how the colon (":") shortcut described 
here also adds some implied semantics. 


A better way to demonstrate the use of the double arrow may be: 


located-samples = { 
sample-point: int, 
samples: [+ float], 
* equipment-type => equipment-tolerances, 
} 
equipment-type = [name: tstr, manufacturer: tstr] 
equipment-tolerances = [+ [float, float]] 


Th xample below defines a struct with optional entries: display 


name (as a text string), the name components first name and family 
name (as text strings), and age information (as an unsigned integer). 
PersonalData = { 
? displayName: tstr, 
NameComponents, 


? age: uint, 


} 


NameComponents = ( 
? firstName: tstr, 
? familyName: tstr, 
) 


Note that the group definition for NameComponents does not generate 
another map; instead, all four keys are directly in the struct built 
by PersonalData. 


In this example, all key/value pairs are optional from the 
perspective of CDDL. With no occurrence indicator, an entry is 
mandatory. 


Birkholz, et al. Standards Track [Page 21] 


RFC 8610 CDDL June 2019 


If the addition of more entries not specified by the current 
Specification is desired, one can add this possibility explicitly: 


PersonalData = { 
? displayName: tstr, 
NameComponents, 
? age: uint, 
* tstr => any 


} 


NameComponents = ( 
? firstName: tstr, 
? familyName: tstr, 


) 
Figure 7: Personal Data: Example for Extensibility 


The CDDL tool described in Appendix F generated the following as one 
acceptable instance for this specification: 


("familyName": "agust", "antiforeignism": "pretzel", 
"springbuck": "illuminatingly", "exuviae": "ephemeris", 
"kilometrage": "frogfish") 


(See Section 3.9 for one way to explicitly identify an extension 
point.) 


3.5.2. Tables 


A table can be specified by defining a map with entries where the 
key type allows more than just a single value; for example: 


Square-roots = (* x => y) 
x — int 
y = float 


Here, the key in each key/value pair has datatype x (defined as int), 
and the value has datatype y (defined as float). 


If the specification does not need to restrict one of x or y (i.e., 
the application is free to choose per entry), it can be replaced by 
the predefined name "any". 
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As another example, the following could be used as a conversion table 
converting from an integer or float to a string: 


tostring = (* mynumber => tstr) 
mynumber = int / float 


3.5.3.  Non-deterministic Order 


While the way arrays are matched is fully determined by the PEG 
formalism (see Appendix A), matching is more complicated for maps, as 
maps do not have an inherent order. For each candidate name/value 
pair that the PEG algorithm would try, a matching member is picked 
out of the entire map. For certain group expressions, more than one 
member in the map may match. Most often, this is inconsequential, as 
the group expression tends to consume all matches: 


labeled-values = { 
? fritz: number, 
* label => value 


label = text 
value = number 


Here, if any member with the key "fritz" is present, this will be 
picked by the first entry of the group; all remaining text/number 
members will be picked by the second entry (and if anything remains 
unpicked, the map does not match). 


However, it is possible to construct group expressions where what is 
actually picked is indeterminate, but does matter: 


do-not-do-this = { 
int => int, 
int => 6, 

} 


When this expression is matched against "{3: 5, 4: 6}", the first 
group entry might pick off the "3: 5", leaving "4: 6" for matching 
the second one. Or it might pick off "4: 6", leaving nothing for the 
second entry. This pathological non-determinism is caused by 
specifying "more general" before "more specific" and by having a 
general rule that only consumes a subset of the map key/value pairs 
that it is able to match -- both tend not to occur in real-world 
Specifications of maps. At the time of writing, CDDL tools cannot 
detect such cases automatically, and for the present version of the 
CDDL specification, the specification writer is simply urged to not 
write pathologically non-deterministic specifications. 
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(Ihe astute reader will be reminded of what was called "ambiguous 
content models" in the Standard Generalized Markup Language (SGML) 
and "non-deterministic content models" in XML. That problem is 
related to the one described here, but the problem here is 
Specifically caused by the lack of order in maps, something that the 
XML schema languages do not have to contend with. Note that 

RELAX NG's "interleave" pattern handles lack of order explicitly on 
the specification side, while the instances in XML always have 
determinate order.) 


3.5.4. Cuts in Maps 


The extensibility idiom discussed above for structs has one problem: 


xtensible-map-example = { 
? "optional-key" => int, 
* tstr => any 


} 
In this example, there is one optional key "optional-key", which, 
when present, maps to an integer. There is also a wildcard for any 
future additions. 
Unfortunately, the data item 


( "optional-key": "nonsense" } 


does match this specification: while the first entry of the group 


does not match, the second one (the wildcard) does. This may very 
well be desirable (e.g., if a future extension is to be allowed to 
extend the type of "optional-key"), but in many cases it isn't. 


In anticipation of a more general potential feature called "cuts", 
CDDL allows inserting a cut "^" into the definition of the map entry: 


xtensible-map-example = { 
? "optional-key" ^ => int, 
* tstr => any 


} 


A cut in this position means that once the member key matches the 

name part of an entry that carries a cut, other potential matches for 
the key of the member that occur in later entries in the group of the 
map are no longer allowed. In other words, when a group entry would 
pick a key/value pair based on just a matching key, it "locks in" the 
pick -- this rule applies, independently of whether the value matches 


Birkholz, et al. Standards Track [Page 24] 


RFC 8610 CDDL June 2019 


as well, so when it does not, the entire map fails to match. In 
summary, th xample above no longer matches the specification as 
modified with the cut. 


Since the desire for this kind of exclusive matching is so frequent, 
the ":" shortcut is actually defined to include the cut semantics. 
So, the preceding example (including the cut) can be written more 
simply as: 


xtensible-map-example = { 
? "optional-key": int, 
* tstr => any 


} 
or even shorter, using a bareword for the key: 


extensible-map-example = { 
? optional-key: int, 
* tstr => any 


3.6. Tags 


A type can make use of a CBOR tag (major type 6) by using the 
representation type notation, giving #6.nnn(type) where nnn is an 
unsigned integer giving the tag number and "type" is the type of the 
data item being tagged. 


For example, the following line from the CDDL prelude (Appendix D) 
defines "biguint" as a type name for an unsigned bignum N: 


biguint = #6.2(bstr) 


The tags defined by [RFC7049] are included in the prelude. 
Additional tags registered since [RFC7049] was written need to be 
added to a CDDL specification as needed; e.g., a binary Universally 
Unique Identifier (UUID) tag could be referenced as "buuid" ina 
specification after defining 


buuid = #6.37(bstr) 
In the following example, usage of tag 32 for URIs is optional: 


my_uri = #6.32(tstr) / tstr 
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3.7. Unwrapping 


The group that is used to define a map or an array can often be 


reused in the definition of another map or array. Similarly, a type 
defined as a tag carries an internal data item that one would like to 
refer to. In these cases, it is expedient to simply use the name of 


the map, array, or tag type as a handle for the group or type defined 
inside it. 


The "unwrap" operator (written by preceding a name by a tilde 
character "^") can be used to strip the type defined for a name by 
one layer, exposing the underlying group (for maps and arrays) or 
type (for tags). 


For example, an application might want to define a basic header and 
an advanced header. Without unwrapping, this might be done as 
follows: 


basic-header-group = ( 
field1: int, 
field2: text, 


) 
basic-header = [ basic-header-group ] 


advanced-header = [ 
basic-header-group, 
field3: bytes, 
field4: number, ; as in the tagged type "time" 


] 


Unwrapping simplifies this to: 


basic-header = [ 
field1: int, 
field2: text, 


] 


advanced-header = [ 
“basic—header, 
field3: bytes, 
field4: “time, 


] 


(Note that leaving out the first unwrap operator in the latter 
example would lead to nesting the basic-header in its own array 
inside the advanced-header, while, with the unwrapped basic-header, 
the definition of the group inside basic-header is essentially 
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repeated inside advanced-header, leading to a single array. This can 
be used for various applications often solved by inheritance in 


programming languages. The effect of unwrapping can also be 
described as "threading in" the group or type inside the referenced 
type, which suggested the thread-like "^" character.) 


3.8. Controls 


A control allows relating a target type with a controller type 
via a control operator . 


The syntax for a control type is "target .control-operator 
controller", where control operators are special identifiers prefixed 


by a dot. (Note that target or controller might need to be 
parenthesized.) 
A number of control operators are defined at this point. Further 


control operators may be defined by new versions of this 
Specification or by registering them according to the procedures in 
Section 6.1. 


3.8.1. Control Operator .size 


A ".size" control controls the size of the target in bytes by the 
control type. The control is defined for text and byte strings, 
where it directly controls the number of bytes in the string. It is 
also defined for unsigned integers (s below). Figure 8 shows 
xample usage for byte strings. 


full-address = [[+ label], ip4, ip6] 
ip4 = bstr .size 4 

ip6 = bstr .size 16 

label = bstr .size (1..63) 


Figure 8: Control for Size in Bytes 


When applied to an unsigned integer, the ".size" control restricts 
the range of that integer by giving a maximum number of bytes that 
should be needed in a computer representation of that unsigned 
integer. In other words, "uint .size N" is equivalent to 
"Q...BYTES N", where BYTES N -- 256**N. 


audio sample = uint .size 3 ; 24-bit, equivalent to 0...16777216 


Figure 9: Control for Integer Size in Bytes 
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Note that, as with value restrictions in CDDL, this control is not a 
representation constraint; a number that fits into fewer bytes can 
Still be represented in that form, and an inefficient implementation 
could use a longer form (unless that is restricted by some format 
constraints outside of CDDL, such as the rules in Section 3.9 of 
[RFC7049]). 


3.8.2. Control Operator .bits 


A ".bits" control on a byte string indicates that, in the target, 
only the bits numbered by a number in the control type are allowed to 


be set. (Bits are counted the usual way, bit number "n" being set in 
"str" meaning that "(str[n >> 3] & (1 << (n& 7))) != 0".) 

Similarly, a ".bits" control on an unsigned integer "i" indicates 
that for all unsigned integers "n" where "(i & (1 << n)) != 0", "n" 


must be in the control type. 


tcpflagbytes = bstr .bits flags 


flags = &( 
fin: 8, 
syn: 9, 
rst: 10, 
psh: 11, 
ack: 12, 
urg: 13, 
ece: 14, 
cwr: 15, 
ns: O0, 


) / (4..7) ; data offset bits 


rwxbits = uint .bits rwx 
rwx — &(r: 2, w: 1, x: 0) 


Figure 10: Control for What Bits Can Be Set 


The CDDL tool described in Appendix F generates the following ten 
example instances for "tcpflagbytes": 


h'906d' h'Olfc' h'8145' h'01b7” h'013d' h'409f” hn'018e' h'cO5f"' 
h'Olfa' h'01fe” 


These examples do not illustrate that the above CDDL specification 
does not explicitly specify a size of two bytes: a valid all-clear 
instance of flag bytes could be "h''" or "h'00'" or even "h'000000'" 
as well. 
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3.8.3. Control Operator .regexp 


A ".regexp" control indicates that the text string given as a target 
needs to match the XML Schema Definition (XSD) regular expression 
given as a value in the control type. XSD regular expressions are 
defined in Appendix F of [W3C.REC-xmlschema-2-20041028]. 


nai = tstr .regexp "[A-Za-z0-9]+@[A-Za-z0-9]+(\\. [A-Za-z0-9] +) +" 
Figure 11: Control with an XSD regexp 
An example matching this regular expression: 
"N1@CH57HF.4Znge0.dYJRN.igjf" 
3.8.3.1. Usage Considerations 

Note that XSD regular expressions do not support the usual \x or \u 
escapes for hexadecimal expression of bytes or Unicode code points. 
However, in CDDL the XSD regular expressions are contained in text 
strings, the literal notation for which provides \u escapes; this 
should suffice for most applications that use regular expressions for 


text strings. (Note that this also means that there is one level of 
string escaping before the XSD escaping rules are applied.) 


XSD regular expressions support character class subtraction, a 
feature often not found in regular expression libraries; 
specification writers may want to use this feature sparingly. 
Similar considerations apply to Unicode character classes; where 
these are used, the specification that employs CDDL SHOULD identify 
which Unicode versions are addressed. 


Other surprises for infrequent users of XSD regular expressions may 
include the following: 


o No direct support for case insensitivity. While case 
insensitivity has gone mostly out of fashion in protocol design, 
it is sometimes needed and then needs to be expressed manually as 
in "[Cc] [Aa] [Ss] [Ee]" 


o The support for popular character classes such as \w and \d is 
based on Unicode character properties; this is often not what is 
desired in an ASCII-based protocol and thus might lead to 


surprises. (\s and \S do have their more conventional meanings, 
and "." matches any character but the line-ending characters \r 
or \n.) 
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3.8.3.2. Discussion 


There are many flavors of regular expression in use in the 
programming community. For instance, Perl-Compatible Regular 
Expressions (PCREs) are widely used and probably are more useful than 
XSD regular expressions. However, there is no normative reference 
for PCREs that could be used in the present document. Instead, we 
opt for XSD regular expressions for now. There is precedent for that 
choice in the IETF, e.g., in YANG [RFC7950]. 


Note that CDDL uses controls as its main extension point. This 
creates the opportunity to add further regular expression formats in 
addition to the one referenced here, if desired. As an example, a 
proposal for a ".pcre" control is defined in [CDDL-Freezer]. 


3.8.4. Control Operators .cbor and .cborseq 


A ".cbor" control on a byte string indicates that the byte string 
carries a CBOR-encoded data item.  Decoded, the data item matches the 
type given as the right-hand-side argument (typel in the following 
example). 


"bytes .cbor typel" 


Similarly, a ".cborseq" control on a byte string indicates that the 
byte string carries a sequence of CBOR-encoded data items. When the 
data items are taken as an array, the array matches the type given as 
the right-hand-side argument (type2 in the following example). 


"bytes .cborseq type2" 


(The conversion of the encoded sequence to an array can be effected, 
for instance, by wrapping the byte string between the two bytes Ox9f 
and Oxff and decoding the wrapped byte string as a CBOR-encoded 

data item.) 


3.8.5. Control Operators .within and .and 
A ".and" control on a type indicates that the data item matches both 
the left-hand-side type and the type given as the right-hand side. 
(Formally, the resulting type is the intersection of the two types 


given.) 


"typel .and type2" 
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A variant of the ".and" control is the ".within" control, which 
expresses an additional intent: the left-hand-side type is meant to 
be a subset of the right-hand-side type. 


"typel .within type2" 


While both forms have the identical formal semantics (intersection), 

the intention of the ".within" form is that the right-hand side gives 
guidance to the types allowed on the left-hand side, which typically 

is a socket (Section 3.9): 


message = $message .within message-structure 
message-structure = [message type, *message option] 
message type = 0..255 


message option - any 


$message /= [3, dough: text, topping: [* text]] 
$message /= [4, noodles: text, sauce: text, parmesan: bool] 


For ".within", a tool might flag an error if typel allows data items 
that are not allowed by type2. In contrast, for ".and", there is no 
expectation that typel is already a subset of type2. 


3.8.6. Control Operators .lt, .le, .gt, .ge, .eq, .ne, and .default 


The controls .lt, .le, .gt, .ge, .eq, and .ne specify a constraint 
on the left-hand-side type to be a value less than, less than or 
equal to, greater than, greater than or equal to, equal to, or not 
equal to a value given as a right-hand-side type (containing just 
that single value). In the present specification, the first four 
controls (.lt, .le, .gt, and .ge) are defined only for numeric types, 
as these have a natural ordering relationship. 


speed = number .ge 0 ; unit: m/s 


.ne and .eq are defined for both numeric values and values of other 


types. If one of the values is not of a numeric type, equality is 
determined as follows: text strings are equal (satisfy .eq / do not 
satisfy .ne) if they are bytewise identical; the same applies for 


byte strings. Arrays are equal if they have the same number of 
elements, all of which are equal pairwise in order between the 
arrays. Maps are equal if they have the same number of key/value 
pairs, and there is pairwise equality between the key/value pairs 
between the two maps. Tagged values are equal if they both have the 
same tag and the values are equal. Values of simple types match if 
they are the same values. Numeric types that occur within arrays, 
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maps, or tagged values are equal if their numeric value is equal and 
they are both integers or both floating-point values. All other 
cases are not equal (e.g., comparing a text string with a byte 
string). 


A variant of the ".ne" control is the ".default" control, which 
expresses an additional intent: the value specified by the 
right-hand-side type is intended as a default value for the 
left-hand-side type given, and the implied .ne control is there to 


prevent this value from being sent over the wire. This control is 

only meaningful when the control type is used in an optional context; 

otherwise, there would be no way to make use of the default value. 
timer - 


time: uint, 
? displayed-step: (number .gt 0) .default 1 
) 


.9. Socket/Plug 


For both type choices and group choices, a mechanism is defined that 
facilitates starting out with empty choices and assembling them 
later, potentially in separate files that are concatenated to build 
the full specification. 


Per convention, CDDL extension points are marked with a leading 
dollar sign (types) or two leading dollar signs (groups). Tools 
honor that convention by not raising an error if such a type or group 
is not defined at all; the symbol is then taken to be an empty type 
choice (group choice), i.e., no choice is available. 


tcp-header = (seq: uint, ack: uint, * SStcp-option} 
; later, in a different file 

$$tcp-option //= ( 

Sack: [+(left: uint, right: uint)] 

) 

; and, maybe in another file 

$$tcp-option //= ( 


sack-permitted: true 


) 


Names that start with a single "$" are "type sockets", starting out 
as an empty type, and intended to be extended via "/-". Names that 
start with a double "$$" are "group sockets", starting out as an 
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empty group choice, and intended to be extended via "//=". In either 
case, it is not an error if there is no definition for a socket at 
all; this then means there is no way to satisfy the rule (i.e., th 


choice is empty). 


As a convention, all definitions (plugs) for socket names must be 
augmentations, i.e., they must be using "/-" and "//-", respectively. 


To pick up the example illustrated in Figure 7, the socket/plug 
mechanism could be used as shown in Figure 12: 


PersonalData = { 
? displayName: tstr, 
NameComponents, 
? age: uint, 
x $Spersonaldata-extensions 


} 
NameComponents = ( 


? firstName: tstr, 
? familyName: tstr, 


; The above already works as is. 
; But then, we can add later: 
$$personaldata-extensions //= ( 
favorite-salsa: tstr, 
) 
; and again, somewhere else: 
$$personaldata-extensions //= ( 
shoesize: uint, 
) 
Figure 12: Personal Data Example: Using Socket/Plug Extensibility 


3.10. Generics 


Using angle brackets, the left-hand side of a rule can add formal 
parameters after the name being defined, as in: 


messages = message<"reboot", "now"» / message<"sleep", 1..100> 
message<t, v» = (type: t, value: v) 
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When using a generic rule, the formal parameters are bound to the 
actual arguments supplied (also using angle brackets), within the 
Scope of the generic rule (as if there were a rule of the form 
parameter - argument). 


Generic rules can be used for establishing names for both types and 
groups. 


(At this time, there are some limitations to the nesting of generics 
in the CDDL tool described in Appendix F.) 


3.11. Operator Precedence 


As with any language that has multiple syntactic features such as 
prefix and infix operators, CDDL has operators that bind more tightly 
than others. This is becoming more complicated than, say, in ABNF, 
as CDDL has both types and groups, with operators that are specific 
to these concepts. Type operators (such as "/" for type choice) 
operate on types, while group operators (such as "//" for group 
choice) operate on groups. Types can simply be used in groups, but 
groups need to be bracketed (as arrays or maps) to become types. So, 
type operators naturally bind closer than group operators. 


For instance, in 


t = [groupl] 
groupl = (a/ b // c / d) 
a 1b ZE 3d 4 


groupl is a group choice between the type choice of a and b and the 
type choice of c and d. This becomes more relevant once member keys 
and/or occurrences are added in: 


t — (group2] 
group2 = (? ab: a / 
a 1 b 2-6 3d 4 


is a group choice between the optional member "ab" of type a or b and 
the member "cd" of type c or d. Note that the optionality is 
attached to the first choice ("ab"), not to the second choice. 
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Similarly, in 


t = [group3] 
group3 = (+ a/b / c) 
a 1 b 20:6 3 


group3 is a repetition of a type choice between a, b, and c; if just 
a is to be repeatable, a group choice is needed to focus the 


occurrence: 
t = [group4] 
group4 = (+ a // b / c) 


a 1b ZE E 


group4 is a group choice between a repeatable a and a single b or c. 


A comment has been that the semantics of group3 could be 
counterintuitive. In general, as with many other languages with 
operator precedence rules, the specification writer is encouraged not 
to rely on them, but to insert parentheses liberally to guide readers 
that are not familiar with CDDL precedence rules: 


t = [group4a] 
group4a = ((+ a) // (b / c)) 
a wp 26 3 


The operator precedences, in sequence of loose to tight binding, are 
defined in Appendix B and summarized in Table 1. (Arities given are 
1 for unary prefix operators and 2 for binary infix operators.) 
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4---------- 4------- 4--------------------------- 4------------ + 


Operator Arity Operates on Precedence 


RENNNNNNRPRPNNNDND» 


name = type, name = group 
name /= type 
name //= group 
group // group 
group, group 

* group 
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? group 
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name: type 
type / type 
type. .type 
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Table 

4. Making Use of CDDL 
In this section, we 
4.1. As a Guide fora 


CDDL can be used to 


1: Summary of Operator Precedences 


discuss several potential ways to employ CDDL. 
Human User 


efficiently define the layout of CBOR data, such 


that a human implementer can easily see how data is supposed to be 


encoded. 


Since CDDL maps parts of the CBOR data to human-readable names, tools 
could be built that use CDDL to provide a human-friendly 
representation of the CBOR data and allow them to edit such data 
while remaining compliant with its CDDL definition. 


4.2. For Automated Checking of CBOR Data Structures 


CDDL has been specified such that a machine can handle the CDDL 
definition and related CBOR data (and, thus, also JSON data). For 
example, a machine could use CDDL to check whether or not CBOR data 
is compliant with its definition. 


Birkholz, et al. 


Standards Track [Page 36] 


RFC 8610 CDDL June 2019 


The need for thoroughness of such compliance checking depends on the 
application. For example, an application may decide not to check the 
data structure at all and use the CDDL definition solely as a means 
to indicate the structure of the data to the programmer. 


On the other hand, the application may also implement a checking 
mechanism that goes as far as checking that all mandatory map members 
are available. 


The matter of how far the data description must be enforced by an 
application is left to the designers and implementers of that 
application, keeping in mind related security considerations. 


In no case is it intended that a CDDL tool would be "writing code" 
for an implementation. 


4.3. For Data Analysis Tools 


In the long run, it can be expected that more and more data will be 
stored using the CBOR data format. 


Where there is data, there is data analysis and the need to process 
such data automatically.  CDDL can be used for such automated data 
processing, allowing tools to verify data, clean it, and extract 
particular parts of interest from it. 


Since CBOR is designed with constrained devices in mind, a likely use 
of it would be small sensors. An interesting use would thus be 
automated analysis of sensor data. 


5. Security Considerations 


This document presents a content rules language for expressing CBOR 
data structures. As such, it does not bring any security issues on 
itself, although specifications of protocols that use CBOR naturally 
need security analyses when defined. General guidelines for writing 
Security considerations are defined in [RFC3552] (BCP 72). 
Specifications using CDDL to define CBOR structures in protocols need 
to follow those guidelines. Additional topics that could be 
considered in a security considerations section for a specification 
that uses CDDL to define CBOR structures include the following: 


o Where could the language maybe cause confusion in a way that will 
enable security issues? 
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o Where a CDDL matcher is part of the implementation of a system, 
the security of the system ought not depend on the correctness of 
the CDDL specification or CDDL implementation without any further 
defenses in place. 


o Where the CDDL specification includes extension points, the impact 
of extensions on the security of the system needs to be carefully 
considered. 


Writers of CDDL specifications are strongly encouraged to value 
clarity and transparency of the specification over its elegance. 
Keep it as simple as possible while still expressing the needed data 
model. 


A related observation about formal description techniques in general 
that is strongly recommended to be kept in mind by writers of CDDL 
Specifications: just because CDDL makes it easier to handle 
complexity in a specification, that does not make that complexity 
somehow less bad (except maybe on the level of the humans having to 
grasp the complex structure while reading the spec). 


6. IANA Considerations 


6.1. CDDL Control Operators Registry 


IANA has created a registry for control operators (Section 3.8). The 
"CDDL Control Operators" registry has been created within the 
"Concise Data Definition Language (CDDL)" registry. 


Each entry in the subregistry must include the name of the control 
operator (by convention given with the leading dot) and a reference 
to its documentation. Names must be composed of the leading dot 
followed by a text string conforming to the production "id" in 
Appendix B. 
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Initial entries in this registry are as follows: 


4---------- 4--------------- t 
Name Documentation 
4---------- 4--------------- + 

size RFC 8610 
bits RFC 8610 
.regexp RFC 8610 
. cbor RFC 8610 
.cborseq RFC 8610 
.within RFC 8610 
.and RFC 8610 
slt: RFC 8610 
.le RFC 8610 
gt RFC 8610 
.ge RFC 8610 
.eq RFC 8610 
.ne RFC 8610 
.default RFC 8610 

4---------- 4--------------- + 


All other control operator names are Unassigned. 


The IANA policy for additions to this registry is "Specification 
Required" as defined in [RFC8126] (which involves an Expert Review) 
for names that do not include an internal dot and "IETF Review" for 
names that do include an internal dot. Th xpert reviewer is 
Specifically instructed that other Standards Development 
Organizations (SDOs) may want to define control operators that are 
specific to their fields (e.g., based on a binary syntax already in 
use at the SDO); the review process should strive to facilitate such 
an undertaking. 
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Appendix A.  Parsing Expression Grammars (PEGs) 
This appendix is normative. 


Since the 1950s, many grammar notations are based on Backus-Naur Form 
(BNF), a notation for context-free grammars (CFGs) within Chomsky's 
generative system of grammars. The Augmented Backus-Naur Form (ABNF) 
[RFC5234], widely used in IETF specifications and also inspiring the 
syntax of CDDL, is an example of this. 


Generative grammars can express ambiguity well, but this very 
property may make them hard to use in recognition systems, spawning a 
number of subdialects that pose constraints on generative grammars to 
be used with parser generators; this scenario may be hard for the 
Specification writer to manage. 


PEGs [PEG] provide an alternative formal foundation for describing 
grammars that emphasizes recognition over generation and resolves 
what would have been ambiguity in generative systems by introducing 
the concept of "prioritized choice". 


The notation for PEGs is quite close to BNF, with the usual "Extended 
BNF" features, such as repetition, added. However, where BNF uses 
the unordered (symmetrical) choice operator os (incidentally notated 
as "/" in ABNF), PEG provides a prioritized choice operator "/". The 
two alternatives listed are to be tested in left-to-right order, 
locking in the first successful match and disregarding any further 
potential matches within the choice (but not disabling alternatives 
in choices containing this choice, as a cut (Section 3.5.4) would). 


For example, the ABNF expressions 
A = wa" "b" / "aU (1) 
and 


A = tas / Nt "HN (2) 


are equivalent in ABNF's original generative framework but are very 
different in PEG: in (2), the second alternative will never match, as 
any input string starting with an "a" will already succeed in the 
first alternative, locking in the match. 


Similarly, the occurrence indicators ("?", "*", "+") are "greedy" in 
PEG, i.e., they consume as much input as they match (and, as a 
consequence, "a* a" in PEG notation or "*a a" in CDDL syntax never 
can match anything, as all input matching "a" is already consumed by 
the initial "a*", leaving nothing to match the second "a"). 
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Incidentally, the grammar of CDDL itself, as written in ABNF in 
Appendix B, can be interpreted both (1) in the generative framework 
on which RFC 5234 is based and (2) as a PEG. This was made possible 
by ordering the choices in the grammar such that a successful match 
made on the left-hand side of a "/" operator is always the intended 
match, instead of relying on the power of symmetrical choices (for 
example, note the sequence of alternatives in the rule for "uint", 
where the lone zero is behind the longer match alternatives that 
start with a zero). 


The syntax used for expressing the PEG component of CDDL is based on 
ABNF, interpreted in the obvious way with PEG semantics. The ABNF 
convention of notating occurrence indicators before the controlled 
primary, and of allowing numeric values for minimum and maximum 
occurrence around a "*" sign, is copied. While PEG is only about 
characters, CDDL has a richer set of elements, such as types and 


groups. Specifically, the following constructs map: 
+------- 4------- 4------------------------------------------- + 
CDDL PEG Remark 
4------- 4------- $ + 
"=" "<-" /= and //= are abbreviations 
n ET nm prioritized choice 
uy UP AU prioritized choice, limited to types only 
nons 3B, p mem" zero or one 
MANOS potes zero or more 
TET P pees one or more 
A B A B sequence 
A, B AB sequence, comma is decoration only 
4------- 4------- 4------------------------------------------- + 
The literal notation and the use of square brackets, curly braces, 


tildes, ampersands, and hash marks are specific to CDDL and unrelated 


to the conventional PEG notation. The DOT (".") from PEG is replaced 
by the unadorned "#" or its alias "any". Also, CDDL does not provide 
the syntactic predicate operators NOT ("!") or AND ("&") from PEG, 
reducing expressiveness as well as complexity. 


For more details about PEG's theoretical foundation and interesting 
properties of the operators such as associativity and distributivity, 
the reader is referred to [PEG]. 
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Appendix B.  ABNF Grammar 


This appendix is normative. 


June 2019 


The following is a formal definition of the CDDL syntax in ABNF 


[RFC5234]. Note that, 
strings below are case insensitive 
case sensitive in CDDL). 


as is defined in ABNF, 


the quote-delimited 
(while string values and names are 


cddl = S 1* (rule S) 
rule = typename [genericparm] S assignt S type 
/ groupname [genericparm] S assigng S grpent 
typename = id 
groupname = id 
assignt = win / w/a" 
assigng — win / "//=" 
genericparm = "<" S id S *("," Sid S ) ">" 
genericarg = "<" S typel S *("," S typel S ) ">" 
type = typel *(S "/" S typel) 
typel = type2 [S (rangeop / ctlop) S type2] 
; Space may be needed before the operator if type2 ends in a name 
type2 = value 
/ typename [genericarg] 
/ " (Gu S type S m) " 
/ " { " S group S " } " 
7: " [^ S group S " ] " 
/ "^" S typename [genericarg] 
/ " & " S " ( " S group S " ) " 
/ "&" S groupname [genericarg] 
7 UE RU "on [" . " uint] " (A S type S wy " 
/ "$" DIGIT ["." uint] ; major/ai 
/ "en ; any 
rangeop - " " / " " 
Ct lop =. ud 
group = grpchoice *(S "//" S grpchoice) 
grpchoice = *(grpent optcom) 
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grpent = [occur S] [memberkey S] type 
/ [occur S] groupname [genericarg] ; preempted by above 
/ [occur S] "(" S group S ")" 
memberkey = typel S ["^" S] "=>" 
/ bareword S ":" 
/ value S ":" 
bareword = id 
optcom = S ["," S] 
occur = [uint] "*" [uint] 
/ "3" 
/ " ?" 
uint = DIGIT1 *DIGIT 
/ "Ox" 1*HEXDIG 
/ "Ob" 1*BINDIG 
/ " 0" 
value - number 
/ text 
/ bytes 
int = ["-"] uint 


; This is a float if it has fraction or exponent; int otherwise 


number - 
hexfloat 
fraction 
exponent 


hexfloat / (int ["." fraction] ["e" exponent ]) 

= ["-"] "Ox" 1*HEXDIG ["." 1*HEXDIG] "p" exponent 
= 1*DIGIT 

= ["+"/"-"] 1*DIGIT 


text = $x22 *SCHAR $x22 
SCHAR = $x20-21 / %x23-5B / %x5D-7E / $x80-10FFFD / SESC 


SESC = "\" (%x20-7E / $x80-10FFFD) 

bytes = [bsqual] $x27 *BCHAR $x27 

BCHAR = $x20-26 / $x28-5B / $x5D-10FFFD / SESC / CRLF 
bsqual = "h" / "b64" 


Birkholz, et a 


Ll. 


Standards Track [Page 46] 


RFC 8610 CDDL June 2019 


id = EALPHA *(*("-" / ".") (EALPHA / DIGIT)) 
ALPHA = $x41-5A / $x61-7A 
EALPHA = ALPHA / wan / n / won 


DIGIT = $x30-39 
DIGIT1 = %x31-39 


HEXDIG = DIGIT J: "AM / "gr / "cn / "p" / "pn" / "pn" 
BINDIG = $x30-31 

S = *WS 

WS = SP / NL 

SP = $x20 

NL = COMMENT / CRLF 

COMMENT = ";" *PCHAR CRLF 


PCHAR = %x20-7E / %x80-10FFFD 
CRLF = %x0A / %x0D.0A 


Figure 13: CDDL ABNF 


Note that this ABNF does not attempt to reflect the detailed rules of 
what can be in a prefixed byte string. 


Appendix C. Matching Rules 


This appendix is normative. 


In this appendix, we go through the ABNF syntax rules defined in 
Appendix B and briefly describe the matching semantics of each 
syntactic feature. In this context, an instance (data item) 
"matches" a CDDL specification if it is allowed by the CDDL 
specification; this is then broken down into parts of specifications 
(type and group expressions) and parts of instances (data items). 


cddl = S 1* (rule S) 


A CDDL specification is a sequence of one or more rules. Each rule 
gives a name to a right-hand-side expression, either a CDDL type or a 
CDDL group. Rule names can be used in the rule itself and/or other 
rules (and tools can output warnings if that is not the case). The 
order of the rules is significant only in two cases: 


1. The first rule defines the semantics of the entire specification; 
hence, there is no need to give that root rule a special name or 
Special syntax in the language (as, for example, with "start" in 
RELAX NG); its name can therefore be chosen to be descriptive. 
(As with all other rule names, the name of the initial rule may 
be used in itself or in other rules.) 
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2. Where a rule contributes to a type or group choice (using "/-" or 
"//="), that choice is populated in the order the rules ar 
given; see below. 


rule = typename [genericparm] S assignt S type 
/ groupname [genericparm] S assigng S grpent 


typename - id 

groupname = id 

A rule defines a name for a type expression (production "type") or 
for a group expression (production "grpent"), with the intention that 


the semantics does not change when the name is replaced by its 
(parenthesized if needed) definition. Note that whether the nam 
defined by a rule stands for a type or a group isn't always 
determined by syntax alone: e.g., "a = b" can make "a" a type if "b" 
is a type, or a group if "b" is a group. More subtly, in "a = (b)", 
"a" may be used as a type if "b" is a type, or as a group both when 
"p" is a group and when "b" is a type (a good convention to make the 
latter case stand out to the human reader is to write "a - (b,)"). 
(Note that the same dual meaning of parentheses applies within an 
expression but often can be resolved by the context of the 
parenthesized expression. On the more general point, it may not be 
clear immediately either whether "b" stands for a group or a type -- 
this semantic processing may need to span several levels of rule 
definitions before a determination can be made.) 


assignt = "=" / "/=" 
assigng "wl" y "//=" 


A plain equals sign defines the rule name as the equivalent of the 
expression to the right; it is an error if the name was already 
defined with a different expression. A "/-" or "//=" extends a named 
type or a group by additional choices; a number of these could be 
replaced by collecting all the right-hand sides and creating a single 
rule with a type choice or a group choice built from the right-hand 
Sides in the order of the rules given. (It is not an error to extend 
a rule name that has not yet been defined; this makes the right-hand 
Side the first entry in the choice being created.) 


genericparm = "<" S id S *("," S id S ) ">" 
genericarg = "<" S typel S *("," S typel S ) ">" 


Rule names can have generic parameters, which cause temporary 
assignments within the right-hand sides to the parameter names from 


the arguments given when citing the rule name. 


type = typel *(S "/" S typel) 
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A type can be given as a choice between one or more types. The 
choice matches a data item if the data item matches any one of the 
types given in the choice. The choice uses PEG semantics as 
discussed in Appendix A: the first choice that matches wins. (As a 
result, the order of rules that contribute to a single rule name can 
very well matter.) 


typel = type2 [S (rangeop / ctlop) S type2] 


Two types can be combined with a range operator (see below) or a 
control operator (see Section 3.8). 


type2 = value 

A type can be just a single value (such as 1 or "icecream" or 
h'0815'), which matches only a data item with that specific value (no 
conversions defined), 


/ typename [genericarg] 


or be defined by a rule giving a meaning to a name (possibly after 
supplying generic arguments as required by the generic parameters), 


/ " qu S type S m) " 


or be defined in a parenthesized typ xpression (parentheses may be 
necessary to override some operator precedence), or 


/ "opm S group S "m 
a map expression, which matches a valid CBOR map the key/value pairs 
of which can be ordered in such a way that the resulting sequence 
matches the group expression, or 

/ wn S group S ERU 
an array expression, which matches a CBOR array the elements of which 
-- when taken as values and complemented by a wildcard (matches 
anything) key each -- match the group, or 


/ "^" S typename [genericarg] 


an "unwrapped" group (see Section 3.7), which matches the group 
inside a type defined as a map or an array by wrapping the group, or 


/ " & " S " ( " S group S " ) " 
/ "&" S groupname [genericarg] 
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an enumeration expression, which matches any value that is within the 
set of values that the values of the group given can take, or 


/ wi "on [" A " uint] " C" S type S n") " 


a tagged data item, tagged with the "uint" given and containing the 
type given as the tagged value, or 


/ "4" DIGIT ["." uint] ; major/ai 


a data item of a major type (given by the DIGIT), optionally 
constrained to the additional information given by the uint, or 


/ "en ; any 
any data item. 


rangeop = "..." / ",," 

A range operator can be used to join two type expressions that stand 
for either two integer values or two floating-point values; it 
matches any value that is between the two values, where the first 


value is always included in the matching set and the second value is 
included for ".." and excluded for "...". 
ctlop: ="; T id 


A control operator ties a target type to a controller type as 
defined in Section 3.8. Note that control operators are an extension 
point for CDDL; additional documents may want to define additional 
control operators. 


group = grpchoice *(S "//" S grpchoice) 


A group matches any sequence of key/value pairs that matches any of 
the choices given (again using PEG semantics). 


grpchoice = *(grpent optcom) 

Each of the component groups is given as a sequence of group entries. 
For a match, the sequence of key/value pairs given needs to match the 
Sequence of group entries in the sequence given. 

grpent = [occur S] [memberkey S] type 

A group entry can be given by a value type, which needs to be matched 


by the value part of a single element; and, optionally, a memberkey 
type, which needs to be matched by the key part of the element, if 
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the memberkey is given. If the memberkey is not given, the entry can 
only be used for matching arrays, not for maps. (See below for how 
that is modified by the occurrence indicator.) 

/ [occur S] groupname [genericarg] ; preempted by above 
A group entry can be built from a named group, or 


/ [occur S] "(" S group S ")" 


from a parenthesized group, again with a possible occurrence 
indicator. 


memberk y = typ 1S ["^" Sg] "=>" 
/ bareword S ":" 
/ value S ":" 


Key types can be given by a type expression, a bareword (which stands 
for a type that just contains a string value created from this 
bareword), or a value (which stands for a type that just contains 
this value). A key value matches its key type if the key value is a 
member of the key type, unless a cut preceding it in the group 
applies (see Section 3.5.4 for how map matching is influenced by the 
presence of the cuts denoted by "^" or ":" in previous entries). 


bareword = id 

A bareword is an alternative way to write a type with a single text 
String value; it can only be used in the syntactic context given 
above. 


optcom = S ["," S] 


(Optional commas do not influence the matching.) 


occur = [uint] "*" [uint] 
/ "4n" 
/ " ?" 


An occurrence indicator modifies the group given to its right by 
requiring the group to match the sequence to be matched exactly for a 
certain number of times (see Section 3.2) in sequence, i.e., it acts 
as a (possibly infinite) group choice that contains choices with the 
group repeated each of the occurrences times. 
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The rest of the ABNF describes syntax for value notation that should 


be familiar to readers from programming languages, with the possible 


exception of h'..' and b64'..' for byte strings, as well as syntactic 


elements such as comments and line ends. 
Appendix D. Standard Prelude 


This appendix is normative. 


The following prelude is automatically added to each CDDL file. 
(Note that technically, it is a postlude, as it does not disturb the 


selection of the first rule as the root of the definition.) 


any = # 

uint = #0 

nint = #1 

int = uint / nint 
bstr = #2 

bytes = bstr 

tstr = #3 


text = tstr 


tdate = #6.0(tstr) 

time = #6.1 (number) 

number = int / float 

biguint = #6.2(bstr) 

bignint = #6.3(bstr) 

bigint = biguint / bignint 

integer = int / bigint 

unsigned = uint / biguint 

decfrac = #6.4([e10: int, m: integer]) 
bigfloat = #6.5([e2: int, m: integer]) 
eb64url = #6.21 (any) 

eb64legacy = #6.22 (any) 

ebl6 = #6.23 (any) 

encoded-cbor = #6.24(bstr) 

uri = #6.32(tstr) 

b64url = #6.33(tstr) 

b64legacy = #6.34(tstr) 

regexp = #6.35(tstr) 

mime-message = #6.36(tstr) 

cbor-any = #6.55799 (any) 
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float16 = $7.25 

float32 = 47.26 

float64 = 47.27 

float16-32 = float16 / float32 
float32-64 = float32 / float64 
float = float16-32 / float64 


false = 47.20 
true = $7.21 


bool - false / true 
nil = 47.22 
null = nil 


undefined - 47.23 
Figure 14: CDDL Prelude 
Note that the prelude is deemed to be fixed. This means, for 


instance, that additional tags beyond those defined in [RFC7049], as 
registered, need to be defined in each CDDL file that is using them. 


A common stumbling point is that the prelude does not define a type 
"string". CBOR has byte strings ("bytes" in the prelude) and text 
strings ("text"), so a type that is simply called "string" would be 
ambiguous. 


Appendix E. Use with JSON 


This appendix is normative. 


The JSON generic data model (implicit in [RFC8259]) is a subset of 
the generic data model of CBOR. So, one can use CDDL with JSON by 
limiting oneself to what can be represented in JSON.  Roughly 
Speaking, this means leaving out byte strings, tags, and simple 
values other than "false", "true", and "null", leading to the 
following limited prelude: 
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any = # 

uint = #0 

nint = #1 

int = uint / nint 
tstr = #3 


text = tstr 
number = int / float 


float16 = 47.25 

float32 - 47.26 

float64 = 47.27 

float16-32 = float16 / float32 
float32-64 = float32 / float64 
float = float16-32 / float64 


false = $7.20 

true = #7.21 

bool = false / true 
nil = #7.22 

null = nil 


Figure 15: JSON-Compatible Subset of CDDL Prelude 


(The major types given here do not have a direct meaning in JSON, but 
they can be interpreted as CBOR major types translated through 
Section 4 of [RFC7049].) 


There are a few fine points in using CDDL with JSON. First, JSON 
does not distinguish between integers and floating-point numbers; 
there is only one kind of number (which may happen to be integral). 
In this context, specifying a type as "uint", "nint", or "int" then 
becomes a predicate that the number be integral. As an example, this 
means that the following JSON numbers are all matching "uint": 


10 10.0 1e1 1.0e1 100e-1 


(Ihe fact that these are all integers may be surprising to users 
accustomed to the long tradition in programming languages of using 
decimal points or exponents in a number to indicate a floating-point 
literal.) 


CDDL distinguishes the various CBOR number types, but there is only 
one number type in JSON. The effect of specifying a floating-point 
precision (float16/float32/float64) is only to restrict the set of 
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permissible values to thos xpressible with binaryl6/binary32/ 
binary64; this is unlikely to be very useful when using CDDL for 
Specifying JSON data structures. 


Fundamentally, the number system of JSON itself is based on decimal 
numbers and decimal fractions and does not have limits to its 
precision or range. In practice, JSON numbers are often parsed into 
a number type that is called "float64" here, creating a number of 
limitations to the generic data model [RFC7493]. In particular, this 
means that integers can only be expressed with interoperable 

xactness when they lie in the range [-(2**53)+1, (2**53)-1] -- a 
smaller range than that covered by CDDL "int". 


JSON applications that want to stay compatible with I-JSON ("Internet 
JSON"; see [RFC7493]) may therefore want to define integer types with 
more limited ranges, such as in Figure 16. Note that the types given 
here are not part of the prelude; they need to be copied into the 
CDDL specification if needed. 


ij-uint = 0..9007199254740991 
ij-nint = -9007199254740991..-1 
ij-int - -9007199254740991..9007199254740991 


Figure 16: I-JSON Types for CDDL (Not Part of Prelude) 


JSON applications that do not need to stay compatible with I-JSON and 
that actually may need to go beyond the 64-bit unsigned and negative 
integers supported by "int" (- "uint"/"nint") may want to use the 
following additional types from the standard prelude, which are 
expressed in terms of tags but can straightforwardly be mapped into 
JSON (but not I-JSON) numbers: 


biguint = #6.2(bstr) 
bignint #6.3 (bstr) 
bigint = biguint / bignint 
integer = int / bigint 
unsigned = uint / biguint 


CDDL at this point does not have a way to express the unlimited 
floating-point precision that is theoretically possible with JSON; at 
the time of writing, this is rarely used in protocols in practice. 


Note that a data model described in CDDL is always restricted by what 
can be expressed in the serialization; e.g., floating-point values 
such as NaN (not a number) and the infinities cannot be represented 
in JSON even if they are allowed in the CDDL generic data model. 
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Appendix F. A CDDL Tool 
This appendix is for information only. 


A rough CDDL tool is available. For CDDL specifications, it can 
check the syntax, generate one or more instances (expressed in CBOR 
diagnostic notation or in pretty-printed JSON), and validate an 
existing instance against the specification: 


Usage: 

cddl spec.cddl generate [n] 

cddl spec.cddl json-generate [n] 

cddl spec.cddl validate instance.cbor 
cddl spec.cddl validate instance.json 


Figure 17: CDDL Tool Usage 
Install on a system with a modern Ruby via: 
gem install cddl 
Figure 18: CDDL Tool Installation 
The accompanying CBOR diagnostic tools (which are automatically 
installed by the above) are described in «https://github.com/cabo/ 
cbor-diag»; they can be used to convert between binary CBOR, a 


pretty-printed hexadecimal form of binary CBOR, CBOR diagnostic 
notation, JSON, and YAML [YAML]. 


Appendix G. Extended Diagnostic Notation 
This appendix is normative. 


Section 6 of [RFC7049] defines a "diagnostic notation" in order to be 
able to converse about CBOR data items without having to resort to 
binary data. Diagnostic notation is based on JSON, with extensions 
for representing CBOR constructs such as binary data and tags. 


(Standardizing this together with the actual interchange format does 
not serve to create another interchange format but enables the use of 
a shared diagnostic notation in tools for and documents about CBOR.) 


This appendix discusses a few extensions to the diagnostic notation 
that have turned out to be useful since RFC 7049 was written. We 
refer to the result as Extended Diagnostic Notation (EDN). 
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G.1. Whitespace in Byte String Notation 


Examples often benefit from some whitespace (spaces, line breaks) in 
byte strings. In EDN, whitespace is ignored in prefixed byte 
Strings; for instance, the following are equivalent: 


h’ 48656c6c6f20776£726c64’ 
h'48 65 6c 6c 6f 20 77 6f 72 6c 64" 
h'4 86 56c 6c6f 

20776 £726c64’ 


G.2. Text in Byte String Notation 


Diagnostic notation notates byte strings in one of the base encodings 
per [RFC4648], enclosed in single quotes, prefixed by >h< for basel6, 
>b32< for base32, >h32< for base32hex, or >b64< for base64 or 
base64url. Quite often, byte strings carry bytes that are 
meaningfully interpreted as UTF-8 text.  EDN allows the use of single 
quotes without a prefix to express byte strings with UTF-8 text; for 
instance, the following are equivalent: 


“hello world” 
h' 68656c6c6£20776£726c64' 


The escaping rules of JSON strings are applied equivalently for 
text-based byte strings, e.g., "X" stands for a single backslash and 
"/" stands for a single quote. Whitespace is included literally, 
i.e., the previous section does not apply to text-based byte strings. 


G.3. Embedded CBOR and CBOR Sequences in Byte Strings 


Where a byte string is to carry an embedded CBOR-encoded item, or 
more generally a sequence of zero or more such items, the diagnostic 
notation for these zero or more CBOR data items, separated by commas, 
can be enclosed in «« and »» to notate the byte string resulting from 
encoding the data items and concatenating the result. For instance, 
each pair of columns in the following are equivalent: 


<<1>> h^01* 

««1, 2>> h'0102' 
««"foo", null»» h’ 63666F6FF6" 
<<>> hr? 
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G.4.  Concatenated Strings 


While the ability to include whitespac nables line-breaking of 
encoded byte strings, a mechanism is needed to be able to include 
text strings as well as byte strings in direct UTF-8 representation 
into line-based documents (such as RFCs and source code). 


We extend the diagnostic notation by allowing multiple text strings 
or multiple byte strings to be notated separated by whitespace; these 
are then concatenated into a single text or byte string, 
respectively. Text strings and byte strings do not mix within such a 
concatenation, except that byte string notation can be used inside a 
Sequence of concatenated text string notation to encode characters 
that may be better represented in an encoded way. The following four 
values are equivalent: 


"Hello world" 

"He] ] [o] " "world" 

"Hello" h'20' "world" 

"" h'48656c6c6f20776f726c64' "" 


Similarly, the following byte string values are equivalent: 


"Hello world” 

"Hello ' ‘’world’ 

"Hello ' h’776£726c64’ 

'Hello' h'20' 'world" 

'' h' 48656c6c6f£20776£726c64" '” bo4'' 
h'4 86 56c 6c6f” h” 20776 £726c64’ 


(Note that the approach of separating by whitespace, while familiar 
from the C language, requires some attention -- a single comma makes 
a big difference here.) 
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G. 


55 


Hexadecimal, Octal, and Binary Numbers 


In addition to JSON's decimal numbers, EDN provides hexadecimal, 
octal, and binary numbers in the usual C-language notation (octal 
with 0o prefix present only). 


The following are equivalent: 


4711 

0x1267 

0011147 
0b1001001100111 


As are: 


Leo 
0x1.8p0 
0x18p-4 


Comments 


Longer pieces of diagnostic notation may benefit from comments.  JSON 
famously does not provide for comments, and basic diagnostic notation 
per RFC 7049 inherits this property. 


In EDN, comments can be included, delimited by slashes ("/"). Any 
text within and including a pair of slashes is considered a comment. 


Comments are considered whitespace. Hence, they are allowed in 
prefixed byte strings; for instance, the following are equivalent: 


h’ 68656c6c6f20776£726c64’ 

h'68 65 6c /doubled 1!/ 6c 6f /hello/ 
20 /space/ 
77] 6f 72 6c 64' /world/ 


This can be used to annotate a CBOR structure as in: 


/grasp-message/ [/M DISCOVERY/ 1, /session-id/ 10584416, 
/objective/ [/objective-name/ "opsonize", 
/D, N, S/ 7, /loop-count/ 105]] 


(There are currently no end-of-line comments. If we want to add 
them, "//" sounds like a reasonable delimiter given that we already 
use slashes for comments, but we could also go, for example, 

for "#".) 
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Appendix H. Examples 
This appendix is for information only. 


This appendix contains a few examples of structures defined 
using CDDL. The theme for the examples is taken from [RFC7071], 
which defines certain JSON structures in English. For a similar 
example, it may also be of interest to examine Appendix A of 
[RFC8007], which contains a CDDL definition for a JSON structure 
defined in the main body of that RFC. 


These examples all happen to describe data that is interchanged in 
JSON. Examples for CDDL definitions of data that is interchanged in 
CBOR can be found in [RFC8152], [GRASP], and [RFC8428]. 


[RFC7071] defines the "reputon" structure for JSON using somewhat 
formalized English text. Here is a (somewhat verbose) equivalent 
definition using the same terms, but notated in CDDL: 


reputation-object = { 
reputation-context, 
reputon-list 


} 


reputation-context = ( 
application: text 


) 


reputon-list = ( 
reputons: reputon-array 


) 
reputon-array = [* reputon] 


reputon = { 
rater-value, 
assertion-value, 
rated-value, 
rating-value, 
conf-value, 
normal-value, 
sample-value, 
gen-value, 
expire-value, 
ext-value, 


FV VV vv 
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rater-value = ( rater: text ) 
assertion-value = ( assertion: text ) 
rated-value = ( rated: text ) 


rating-value = ( rating: floatl6 ) 
conf-value = ( confidence: floatl6 ) 
normal-value = ( normal-rating: floatl6 ) 


sample-value = ( sample-size: uint ) 
gen-value = ( generated: uint ) 
expire-value = ( expires: uint ) 
ext-value = ( text => any ) 


An equivalent, more compact form of this example would be: 


reputation-object = { 
application: text 
reputons: [* reputon] 


} 


reputon = { 
rater: text 
assertion: text 
rated: text 
rating: floatl6 
confidence: float16 
normal-rating: float16 
sample-size: uint 
generated: uint 
expires: uint 
text => any 


FY VY Vv 


} 


Note how this rather clearly delineates the structure somewhat 
shrouded by so many words in Section 6.2.2 of [RFC7071]. Also, this 
definition makes it clear that several ext-values are allowed (by 
definition with different member names); RFC 7071 could be read to 
forbid the repetition of ext-value ("A specific reputon-element 

MUST NOT appear more than once" is ambiguous). 
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The CDDL tool described in Appendix F generates as one example: 


{ 


"application": "conchometry", 
"reputons": [ 
{ 
"rater": "Ephthianura", 
"assertion": "codding", 
"rated": "sphaerolitic", 


"rating": 0.34133473256800795, 
"confidence": 0.9481983064298332, 
"expires": 1568, 

"unplaster": "grassy" 


"rater": "nonchargeable", 
"assertion": "raglan", 

"rated": "alienage", 

"rating": 0.5724646875815566, 
"sample-size": 3514, 
"Aldebaran": "unchurched", 
"puruloid": "impersonable", 
"uninfracted": "pericarpoidal", 
"schorl": "Caro" 


"rater": "precollectable", 
"assertion": "Merat", 

"rated": "thermonatrite", 

"rating": 0.19164006323936977, 
"Confidence": 0.6065252103391268, 
"normal-rating": 0.5187773690879303, 
"generated": 899, 


"speedy": "solidungular", 
"noviceship": "medicine", 
"checkrow": "epidictic" 
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